Background: Next-generation sequencing (NGS) offers a unique opportunity for high-throughput genomics and\udhas potential to replace Sanger sequencing in many fields, including de-novo sequencing, re-sequencing, meta-\udgenomics, and characterisation of infectious pathogens, such as viral quasispecies. Although methodologies and\udsoftware for whole genome assembly and genome variation analysis have been developed and refined for NGS\uddata, reconstructing a viral quasispecies using NGS data remains a challenge. This application would be useful for\udanalysing intra-host evolutionary pathways in relation to immune responses and antiretroviral therapy exposures.\udHere we introduce a set of formulae for the combinatorial analysis of a quasispecies, given a NGS re-sequencing\udexperiment and an algorithm for quasispecies reconstruction. We require that sequenced fragments are aligned\udagainst a reference genome, and that the reference genome is partitioned into a set of sliding windows\ud(amplicons). The reconstruction algorithm is based on combinations of multinomial distributions and is designed to\udminimise the reconstruction of false variants, called in-silico recombinants.\udResults: The reconstruction algorithm was applied to error-free simulated data and reconstructed a high\udpercentage of true variants, even at a low genetic diversity, where the chance to obtain in-silico recombinants is\udhigh. Results on empirical NGS data from patients infected with hepatitis B virus, confirmed its ability to\udcharacterise different viral variants from distinct patients.\udConclusions: The combinatorial analysis provided a description of the difficulty to reconstruct a quasispecies, given\uda determined amplicon partition and a measure of population diversity. The reconstruction algorithm showed\udgood performance both considering simulated data and real data, even in presence of sequencing errors.
展开▼